A Bivariate Measure of Redundant Information 
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We define a measure of redundant information based on projections in the space of probability dis- 
tributions. Redundant information between random variables is information that is shared between 
those variables. But in contrast to mutual information, redundant information denotes information 
that is shared about the outcome of a third variable. Formalizing this concept, and being able to 
measure it, is required for the non- negative decomposition of mutual information into redundant 
and synergistic information. Previous attempts to formalize redundant or synergistic information 
struggle to capture some desired properties. We introduce a new formalism for redundant informa- 
tion and prove that it satisfies all the properties necessary outlined in earlier work, as well as an 
additional criterion that we propose to be necessary to capture redundancy. We also demonstrate 
the behaviour of this new measure for several examples, compare it to previous measures and apply 
it to the decomposition of transfer entropy. 
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I. INTRODUCTION 

In this paper we present a new formalism for redun- 
dant information] measuring for three (finite) random 
variables X, Y and Z how much information the random 
variable X contains about Z that is also contained in 
Y . Information, in this paper, is based on Shannon en- 
tropy [m , formalizes how much information one variable 
contains about another, where mutual information is the 
established formalism to quantify this (see Q for a de- 
tailed account). 

A naive extension of mutual information to infor- 
mation shared among multiple variables faces several 
problems. Since mutual information only measures the 
amount of information one variable contains about an- 
other it is unclear if two variables X and Y, which 
both contain information about Z, actually contain the 
"same" information. Alternatively, we could ask how 
much additional information (e.g. reduction in entropy) 
about Z would we get from X, if we already knew Y7 
This can be formalized as conditional mutual informa- 
tion I{Z;X\Y) = I{Z; X, Y) - I{Z; Y). Thus one might 
think that I(Z;X) — I{Z;X\Y), also called interaction 
information [7| , is a candidate for a measure of redundant 
information, but the problem here is that it also captures 
the synergy between X and Y in the same measurement: 
in some cases, e.g. for binary variables, with Z being the 
outcome of an XOR combination of X and Y, each vari- 
able by itself contains no information about Z , but both 
taken together do contain information, which would be 
detected by the conditional mutual information. But we 
want redundant information only to be present if this in- 
formation about Z is present in each variable on its own. 
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Redundant as well as synergistic information is informa- 
tion about the output variable contained in both vari- 
ables; redundant information on the one hand is directly 
available in each input variable, whereas synergistic in- 
formation is only available in the joint variable of the 
inputs. As we saw, interaction information cannot dis- 
tinguish between redundant information and synergistic 
information, and is therefore ill-suited for this purpose. 

In general, we want a redundant information formalism 
that quantifies how much Shannon information about the 
outcome of a multivariate mechanism a variable provides 
on its own that is also provided by all other variables as 
well. 



II. RELATED WORK 

Studies of synergies and redundancies have received 
attention in several areas including computational neu- 
roscience d, d, [13, H^l and genetic regulatory networks 
[21I [22j . However, there seems to be no agreement how 
to best measure redundancy and synergy. A detailed 
overview of the requirements for a measure of synergy 
and redundancy, as well as a comprehensive overview of 
possible candidate measures can be found in [isj . 

Generalizations of mutual information have been pro- 
posed as measures of redundant information in the liter- 
ature: One of them is total correlation also called multi- 
information which measures all dependencies among the 
individual variables Q- Another generalization is called 
interaction information (as used in the introductory ex- 
ample in Section IJ), measuring the information that is 
shared among the variables of the system, but not shared 
by any subset of the variables However, both mea- 
sures do not explain the structure of multivariate infor- 
mation in terms of atomic information quantities shared 
between variables. The former only quantifies the depen- 
dencies, where the latter has the problem of possibly be- 
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ing negative. Therefore, interaction information cannot 
distinguish between a system of independent variables 
and a system where redundancies and synergies between 
variables compensate each other. Thus, it also fails to 
capture theprecise structure of multivariate mutual in- 
formation [Ta. [3oj. 

Other measures, like interaction complexity [l^ give 
a good insight into the structure of interactions among 
random variables, however interactions and redundancy, 
though related, are not the same, as interaction complex- 
ity does not fulfill the criteria stated in [2^. Moreover, 
measures of information flow [H, which are able to 
measure the overall amount of causal information flow, 
still struggle with over-determination (i.e. the measure- 
ment of redundant causal information flow), which is 
closely related to the problem of identifying redundant 
information. 

A new approach addressing these problems was intro- 
duced by Williams and Beer [s^l. It introduces a non- 
negative decomposition of multivariate mutual informa- 
tion terms I{Z; Xi, Xk). The decomposition captures 
all redundancies and synergies between all possible sub- 
sets of the variables Xi,...,Xk with respect to another 
random variable Z. Thus, the decomposition is able to 
reveal the atomic structure of the information that is 
shared by the variables Xi, and Z. 

Williams and Beer's decomposition can be applied to 
other information theoretic measures like transfer en- 
tropy as well. This allows to get further insight into the 
information transfer between processes by distinguishing 
state- independent information transfer from state depen- 
dent information transfer [sH ] . 

The information decomposition relies on a measure of 
redundancy (sO]. Redundancy quantities then become 
the "building blocks" of the construction. Information 
in the sense of Shannon's information theory, as used 
here, always denotes a measure of information that one 
variable contains about another. The notion of redun- 
dancy then translates to information theoretic terms as 
the information that two variables share about another 
variable. 

We will argue that the redundancy measure proposed 
by Williams and Beer, while exhibiting a number of es- 
sential properties needed to formalize redundancy, is not 
capturing the concept of redundancy in a fully satisfac- 
tory way. These problems have been noted by Griffith 
[l5j |. who recently proposed [l6| a synergy /redund ancy 
measure based on intrinsic conditional information [23|, 
which shares similarities with an information bottleneck 

We propose a different measure for the bivariate case 
which addresses our concerns and we compare it to the 
existing measures [l^ [sO] . The measure is based on a 
geometric argument and we will show that it fulfils all 
axioms required by Williams for a redundancy measure 
[29| . We also demonstrate that the non- negativity of the 
information decomposition is still guaranteed when using 
our measure. Furthermore, we will argue in favour of an 



additional axiom that any measure of redundancy has to 
fulfil. 



A. Minimal Information as a Measure of 
Redundancy 

As mentioned above, the term redundancy has been 
used in several contexts denoting different quantities. 
Here, we specificly consider information about another 
random variable that is shared among several random 
variables and we mean the same "piece" of information. 
A candidate measure for this quantity is called minimal 
information and denoted by /min [sof . 

Given a set of finite random variables Xy = 
{Xi, Xn}, the index set V = and a finite 

random variable Z with values from Xi x ... x X„ and Z 
respectively, we denote the mutual information between 
Z and Xv as follows: 



IiZ;Xv) :=/(Z;Ai,...,X„). 



(1) 



Following |3(|. we now define the (non-negative) specific 
information |12| . the increase in likelihood (or reduction 
in surprise) of the outcome of a specific event, where 
A C T/, by 



xa\ 



log 



1 



p{z) 



loa 



1 



p{z\xj 



Dkl ip{xA\z) \\pixA)) , 



(2) 
(3) 



where Dkl (■ II •) is the usual Kullback-Leibler diver- 
gence. This is then be used by Williams and Beer to 
define the minimal information a set of random variables 
contains about the outcome as 

Inun{Z;Ai,...,Ak) ■.= yp{z)mmI{Z = z■A^)■ (4) 



This measure is obviously non- negative and, in fact, posi- 
tive if all variables contain some information about a spe- 
cific outcome (for outcomes having probabilities which do 
not vanish). 

For the bivariate case we will change the notation 
slightly and use the random variables directly instead 
of the index set notation, so instead of /„iin(Z; Ai, ^2), 
where Ai and A2 are index sets of some collection of 
random variables, we will directly write IminiZ; X,Y). 



B. Redundancy Axioms 

In [2^, Williams states three axioms any redundancy 
measure has to fulfill. For any redundancy measure 
In{Z; Ai, ...,Ak) the following must hold: 

Symmetry: In is symmetric with respect to the Ai''s. 

Self Redundancy: In{Z;A) = I{Z;Xa). 
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Monotonicity: 

In{ZUi.-.Ak-i,Ak) < In{Z;Ai,...,Ak-i) 

with equality if C Aj,. 

From these axioms foUows the non-negativity of the re- 
dundancy measure, and that it is bounded above by the 
mutual information between Z and each source. To prove 
this, note that Ai are subsets of V that could be empty, 
and for consistency In{Z; 0) = by definition. It is easy 
to check that all three axioms are fulfilled by the measure 

-^min I^Ojl ■ 



C. Why Minimal Information is not Capturing 
Redundancy 

This measure contradicts a basic intuition about re- 
dundancy. Let us consider the case with two binary input 
variables X,Y (i.e. X = y = {0,1}) that are indepen- 
dent, uniformly distributed and where Z = (X,Y) is an 
unaltered copy of both variables, i.e. the joint distribu- 
tion of X and Y. Now we expect that there should be no 
redundancy between X and Y with regard to Z because 
we know that X and Y are independent, so the informa- 
tion contained about Z in X and Y respectively is clearly 
not the same. However, we have IminiZ; X,Y) = 1 bit. 

This happens because for each outcome of X or F we 
observe a reduction of entropy regarding an outcome z 
(i.e. the specific information between X and z as well Y 
and z is positive). However, we ignore that even though 
X and Y give the same amount of information about 
an outcome z, they tell something different about the 
change of the distribution p{z) after an observation in 
X or Y has been made. In this particular example X 
gives information about the first component of Z while, 
Y gives information about the second component of Z. 

More precisely the a posteriori distributions of Z, 
p{z\x) and p{z\y), when either X oi Y have been ob- 
served, give a different kind of information (have differ- 
ent content) even though they give the same amount of 
information. The core idea therefore is to separate the 
contributions of X and Y by adopting a geometric view 
in the space of probability distributions over Z. 



III. A NEW MEASURE OF REDUNDANT 
INFORMATION 

To define a new (bivariate) redundancy measure we 
will take a geometric view on informational quantities. 
Information geometry is a powerful tool-set to investi- 
gate information theoretic question in the context of Rie- 
mannian manifolds (T], Q • Geometric arguments and al- 
gorithms have profound application to information the- 
ory, statistics jll| and have been successfully employed to 
construct information theoretic multivariate interaction 



measures [ist . Information geometry deals with statisti- 
cal manifolds of probability distributions equipped with 
the Fisher metric [l| . The Kullback-Leibler divergence is 
now a divergence function on the statistical manifold and 
thus certain helpful properties and theorems, such as the 
Pythagorean Theorem, can be used. Here, we will intro- 
duce concepts of information geometry only as needed as 
most arguments can be done on an ad-hoc basis. 



A. Additional Axiom 

Before we start with the construction of the measure, 
we want to address the shortcoming identified above. For 
this purpose, we propose to add an additional axiom to 
the axioms from Section FlI Bl We call it the identity prop- 
erty, as it states how redundancy should behave with re- 
spect to a joint random variable of identical copies of the 
two source variables. It requires that for any redundancy 
measure /p 

In iiXA,,XA,);A,,A2) = I{Xa,;Xa,) (5) 

The idea behind this additional axiom is, that if the (bi- 
variate) mechanism we are considering is just copying 
the input, the redundancy must be exactly the mutual 
information between the variables. Given a multivari- 
ate redundancy measure the monotonicity automatically 
states that the multivariate redundancy is then bounded 
above by the minimum of pairwise mutual information 
terms. 



B. Construction of a Redundant Information 
Measure 

The redundancy measure we will construct is based 
on the notion of projected information which we will in- 
troduce shortly. We will begin with the definition of a 
bivariate redundancy measure /red; i-c we will measure 
the redundancy between two sources X and Y with re- 
spect to Z denoted by I^-cdiZ; X, Y). 



1. Preliminaries 

In what follows, let A(Z) denote the space of all prob- 
ability distributions over Z . An information projection 
is now defined as the minimization of the Kullback- 
Leibler divergence between a probability distribution in 
p e A(Z) and a subset B C A{Z): 

7rB(p) := argminDKL (p Ik) ■ (6) 

res 

The Kullback-Leibler divergence is not symmetric, there- 
fore it is possible to define a dual projection ttb*(p) where 
the parameters of D^l (■ || ■) are reversed (in [l^], i^Biji) 



4 



is called reverse information projection and 'nB*{p) in- 
formation projection). Here wc will exclusively use the 
projection iTBip)- 

For B C A(Z), we denote the convex closure of B in 
A(Z) by 

Cci(i?) = {Ap+(l-A)<7|p,<7eB,Ae [0,1]}. (7) 

As A{Z) is convex we have Cc\{B) C A{Z). Observing 
an event a; in X or ?/ in 1^ leads to a distribution over Z , 
p{-\x) G A{Z) and p{-\y) G A(Z) respectively. Let 

{X)z:={p{-\x):xeX} (8) 

denote the set of all conditional distributions of Z for 
the different events of X. Because the marginal distribu- 
tions over Z are a convex combination of the conditional 
distributions, namely 

P{z) = ^p{z\x)p{x), (9) 

X 

we have that the space of distributions over X, i.e. A(X), 
is embedded in A{Z) by the convex set 

C,i{{X)z) = C,i{{p{-\x):xeX}). (10) 

The convex closure of {X)z in A(Z) now contains all 
possible marginals p{z) if we do not know the actual dis- 
tribution of X, but where the mechanism (the condi- 
tional distribution) is known. For example, the problem 
of finding the channel capacity between two random vari- 
ables X and Z can now be translated to find the point in 
the convex closure that maximizes its Kullback-Leibler 
divergence from all extremal points p{-\x) of the convex 
closure (weighted by the respective probabilities p{x)), as 
this is equivalent to maximizing the mutual information 
between X and Z . 



2. Projective Information 

Using information projections we can now project the 
conditionals of one variable onto the convex closure of 
the other. We denote this projection by 

Pix\Y){-) ■=T^C,,i{Y)z)iPi-\^))- (11) 

The projection is not guaranteed to be unique (for 
uniqueness, the set we are projecting onto would need 
to be log-convex and not convex [l3l), however this does 
not matter for our purposes as we will see in the next 
lemma. Now, we define the projected information of X 
onto Y with respect to Z as 

r,iX\Y):=J2piz,x)logP-^^^. (12) 

z,x ^ ' 

The rationale behind this construction is that the pro- 
jected information quantifies the amount of information 



that two variables share with each other, here X and 
Z, that can be expressed in terms of the information Y 
shared with Z (we are projecting onto Y). This is illus- 
trated for binary input variable in FIG. [TJ 

Lemma 1. Projected information {X \ Y) is well- 
defined, finite and non-negative. 

Proof. First, note that projected information can be writ- 
ten as the difference of two Kullback-Leibler divergences 

/| {X\Y)^ ^p(x)[i?KL {p{z\x) \\p{z)) 

X 

-DkL {piz\x) ||p(^Yy)(z))]. 

Therefore, if the projection is not unique, projected infor- 
mation only takes the KL-divergence into account which 
is the same for all possible solutions of the minimization 
problem in Now we have Dkl {p{z\x) || P{x\y){z)') < 
Dkl {p{z\x) \\ p{z)) for all X £ X because of p{z) G 
Cci{{Y)z) and the definition of p(^x\^Y){z) as the dis- 
tance minimizing distribution to p{-\x) in Cci{{Y)z). 
Hence /| {X\Y) > 0. Furthermore I{X; Z) = 
HxV{x)D^i^(j>{z\x)\\p{z)) <co. □ 

3. Definition of Bivariate Redundancy 

The (bivariate) redundancy measure is now simply 
defined as the minimum of both projected information 
terms 

/,ed(^; X, Y) min{/| {X\Y), (Y \ X)}. (13) 

At this point wc can take the minimum over both values 
because we already corrected for the change of the dis- 
tributions in different directions by projecting the con- 
ditionals. This is different to the approach taken by 
Williams and Beer [30l |. where the minimization does 




FIG. 1. Construction of projective information for binary 
input variables. 
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not consider that events in different source variables may 
change the distribution of the outcome in different direc- 
tions in the geometrical space of distributions. Moreover, 
we define self-redundancy explicitly as 



to prove: 



Iicd{Z;X) := Ircd{Z;X,X) 
= Il{X\X). 



(14) 
(15) 



4. The Proposed Measure is a Bivariate Redundancy 
Measure 



To show that this is actually a redundancy measure, 
we have to show that it fulfils the four axioms (symmetry, 
self- redundancy, monotonicity and identity). Symmetry 
is obviously fulfilled, self- redundancy is also very quick 



i,,a{z-,x) = rz{x\x) 

= ^P{z,x) log 

= ^P{z^^) log 
= I{Z-X). 



P{x\x){z) 
p{z) 

P{z\x) 
p{z) 



(16) 
(17) 

(18) 
(19) 



For the monotonicity axiom we first need to show 
I^-cd{Z;X,Y) < I{Z;X). Using the expression of pro- 
jected information as a difference of Kullback-Leibler di- 
vergences we get 

I,,4Z;X,Y)<I^{X\Y) (20) 
= ^Pi^) [Dkl {p{z\x) II p{z)) 

X 

^ Dkl{p{z\x)\\p^,^y~,{z))] (21) 
= I(Z;X) - ^p{x)Dkl {p{z\x) • 

X 

Hence it follows that /rod (2'; X, Y) < I{Z; X) as the KL- 
divergence is non-negative. To show equality holds if 
X CY we will first need the following two lemmas 



Lemma 2. For all x Q X and random variables Y and W, 

^p{z\x) {logp^^\^(^Y,w)){z) - logp(^Yi')(^)) > 0- 



(22) 



Proof. Let x £ X, a.s Cc\{{Y)z) ^ Cci{{{Y,W)) z) (note that p{y\z) — J^wPiy^'^l'^)) have due to the definition of 
the projection that 



^p{z\x) log- 



p{z\x) 



< ^p{z\x)\og 



p{z\x) 



' P{x\iY,W))iz) ^ P(^x\^Y)iz) 
^P{z\x) log P(^^-^^Y,W)){z) >^p{z\x)\0gp(^^y){z) 

Lemma 3. For all {y,w) £ y x W 

J2p{z\y,w) (logp((j,,„)YA')(2:) -logP(yVx)(^)) > 0. 
Proof. By definition, we have that r = P{(y.w)\x) is minimizing DKL{p{z\y,w)\\r) therefore 



^p(z|y, w) log- 



p{z\y,w) 



< 



^p{z\y,w)\og 



Piz\y,w) 



' Pay,w)\X)iz) ^ P{y\X){z) 

^p(z|2/,u;)logp((j^_„)YX)(^) > Xl^'(^ly'^)^°SP(yVX)(^) 



(23) 

(24) 
□ 



(25) 



(26) 

(27) 
□ 



Now the following proposition proves the missing piece 
for the monotonicity. 

Proposition 4. I^ediZ; X, Y) < I,,a{Z; X, {Y, W)) 
Proof. From Lemma [5] it follows directly that 



I 

mX\Y) < I^{X\{Y,W)), furthermore from 
Lemma El /| (F V X) < m{Y,W) \ X) re- 
spectively. Hence, we conclude I^cd{Z;X,Y) < 
I,UZ;X,{Y,Wj). □ 
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Now it is only left to show that the measure also fulfils 
our new identity property, namely 



I,,^{{XX):X,Y)^I{X-Y). 
First we need the following lemma 



(28) 



Lemma 5. If Z ^ (X, y) and {x' ,y') denote an event of 
Z thenp(y,s^x){x' ,y') ^ pi^^,x^Y){x' ,y') = p{x'\y')p{y'\x') 

Proof. Let r G Cci{{X) z), it is of the form 

r{x', y') = c^Mx', y'\x) = aMy'l^'), (29) 

X 

where > and ^ a^; = 1 . We also have 

Dkl {p{-\y) \\r) = Y, pi^'^ y'\y) log ^aMv'W) ^^^^ 



X ,y 



(31) 



A simple calculation shows that the point ax' = p(x'\y) 
fulfills the Karush-Kuhn- Tucker (KKT) conditions [I^ 
for the minimization of Eq. pip with respect to the vec- 
tor ax' and the simplex constraints. The KL-divergence 
is convex in the second parameter and thus it follows 
from the KKT conditions that ax' = p{x'\y) is a global 
solution for the constrained minimization of the KL- 
divcrgcnce Dkl {pi-\y) \\ r) parametrized by ax as in 
Eq. ((3T|) and in turn r{x' , y') = p{x'\y)p{y\x'). If we now 
set y' ^ y then we get p(y'\^x){x' ,y') = p{x'\y')p{y'\x') 
and P{x'\Y){x',y') = p{x'\y')p{y'\x') respectively. □ 

And hence we can conclude our proof with the follow- 
ing proposition: 



Proposition 6. I^yi^XY) = Ixvi^X X) 
I{X-Y) 

Proof. Without loss of generality, 

Il,y {X \ Y) 

= 2^ P{x ,y ,x) log — — — 

= H{X, Y) + Y, P{x\ y') \ogp,^x'\Y){x' , y') 

x' ,y' 

= H{X, Y) + ^p(a;, y) \og[p{x\y)p{y\x)] 

x,y 

= H{X, Y) - H{X\Y) - H{Y\X) 
= I{X;Y). 



(32) 
(33) 



(34) 
(35) 

□ 



Thus /red is a good candidate for measuring redun- 
dancy (in terms of redundancy with respect to some tar- 
get variable). 



IV. COMPARISONS 

Now that we have constructed a bivariate redundancy 
measure, we will present a few examples of redundancy 
calculations. 



A. Relation to Minimal Information 

There are some cases where /rod and Inun coincide and 
we will have a look at some of these cases later in Sec- 
tion IIV CI In general there is a tendency of /min to over- 
estimate redundancy and in our examples it seems that 
/min is an upper bound for /^ed in most cases. There are 
a few exceptions, but it is not yet clear for which cases 
these exceptions appear or whether they are due to nu- 
merical instabilities. The overestimation of redundancy 
by /min becomes predominant if the dimension of Z is in- 
creased (see FIG. [3]). The explanation for this is that, the 
higher the dimension of the space gets, the larger the er- 
ror becomes which results from not taking directionality 
into account. 



B. Decomposition of Mutual Information 

In [3^ Williams and Beer introduce partial informa- 
tion atoms (Pl-atoms) as a way to decompose multivari- 
ate mutual information into non-negative terms. These 
terms can be defined for any multivariate redundancy 
measure and denote redundant and synergistic contribu- 
tions between several variables of a set of random vari- 




FIG. 2. Pl-diagram for tlie decomposition of tlie mutual infor- 
mation between Z and X, Y into Pl-atoms. {X, Y} denotes 
tlie synergistic, {X}, {Y} the unique and {X}{y} tiie redun- 
dant part of tiie mutual information. 
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Jmin -*min Jmin 

(d) \Z\ = 8 (c) \Z\ = 20 (f) \Z\ = 40 



FIG. 3. Comparison of Imin and /^ed for random distributions p{x, y, z) and \X\ — \y\ — 3 and different sizes of Z. Note that as 
the dimension goes up, Jmin gets larger in comparison to /rod- The distributions are initialized with random values, additionally 
the probability of each event being set to with probability 0.5. 



ables R towards another random variable Z. They arc 
denoted by Il-[i{Z; a) where a is a set of subsets of the 
base set of random variables R. As this construction 
is possibly with any redundancy measure, we will use 
Ilji{Z;a) denoting the Pl-atoms based on I^^i^ as a re- 
dundancy measure and thereby staying consistent in the 
notation with [30| . The primed version n^(Z; a) on the 
other hand will denote the decomposition using the re- 
dundancy measure /red introduced here. 

In the bivariate case, this leads to the decomposition of 
mutual information I{Z; X, Y) into four partial informa- 
tion atoms. Here we have R = {X, y}. Now, following 
[soj there are four atomic terms, 

• n^(2'; {X}{y}) which is the redundant informa- 
tion contained in X and Y about Z , 

• n^(Z; {X}) and W^{Z] {Y}) are the unique infor- 
mation about Z, which is only contained in X or 

Y respectively, 

• and n^(Z;{X, y}), synergistic information, the 
information about Z that is only available if X and 

Y are both known. 

The sum of these terms is exactly the mutual information 



between Z and all sources, i.e. 

/(Z; X, Y) = n'^{Z; {X}{Y}) + Jl'^{Z; {X}) 

+\i^{Z-{Y})+W^{Z-{X,Y}). (36) 

as well as 

I{Z- X) = J\'^{Z- {X]{Y]) + n'^{Z- {X}) (37) 

and for Y respectively. Still following [s^, but hav- 
ing replaced /min by /red we get Il'^{Z;{X]{Y]) = 
/red(^; X, Y) and Ii'^{Z- {X}) = I{Z; X)-I,,a{Z; X, Y). 
Finally, for the synergistic term 

U'^iZ; {X, Y}) = I{Z; X, Y) - ni,(Z; {X}) 

-nk(Z;{n) 

-W^{Z-{X}{Y}) (38) 
^I{Z-X,Y)-I{Z-X) 

-I{Z;Y) + I,,a{Z-X,Y). (39) 

Now this decomposition is not non-negative by default 
and this needs to be shown for the specific redundancy 
measure used. It is shown by Williams in [2^ for the 
decomposition using /min- Here, we will show it for the 
bivariate case with /red as redundancy measure: Firstly, 
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I,:ed{Z; X,Y) is non-negative, as shown earlier, further- 
more it fohows from the axioms of the redundancy mea- 
sure that Ired{Z; X, Y) < I{X; Z) and with the same ar- 
gument Ii-cd{Z-, X,Y) < I{Y;Z) which immediately im- 



Given the non-negativity of the decomposition, we 
can visualize it using a Pl-diagram as seen in FIG. [H 
The whole circle represents the mutual information 
I{Z;X,Y) and the colored/shaded regions represent re- 
dundant (yellow/light shaded), unique (red/dark shaded) 
and synergistic (blue/medium shaded) information. 

C. Examples 

We will now go through some examples for the bivari- 
ate measure, in particular those discussed in [l5|, which 
are a good selection of test cases for the desired proper- 
ties of a redundancy /synergy measure. 

1. Copying - From Redundancy to Uniqueness 

Our first example is a very simple mechanism which 
simply copies the binary input variables X and Y into 
Z, i.e. Z = {X,Y). However, we also add a control 
paremeter A S [0,1] which determines how correlated X 
and Y are, as follows: Let VF be a uniformly distributed 
binary random variable, p{x\w) = -f (1 — X)6xw and 
p{y\w) = A^ -t- (1 — \)5yw For A = 1 we have that X and 



plies that the unique information terms are non-negative. 
The following lemma now gives the non-negativity of the 
synergistic term: 

(40) 
(41) 

(42) 
(43) 



(44) 
(45) 



(46) 
□ 

I 

Y are independent and we recover the example "Unq 
(Unique Information)" from [isj . On the other extreme 
A = we have that X and Y are identical copies of W 
and therefore Z is equivalent to W from an information 
theoretic point of view. This is also reflected in the de- 
composition as in this case I{Z; X, Y) = I{W; X, Y) and 
Ired{Z; X, Y) = Ircd{W; X, Y), so we can see that this is 
the example "Rdn (Redundant Information)" from p^ . 



X 




(a) Bayesian model (b) Pl-diagram (c) Pl-diagram 

for A = 0, for A = 1, 

complete complete 

redundancy uniqueness 

(Rdn) (Unq) 



FIG. 4. Copy Example. Complete redundancy and complete 
uniqueness using I^ed- 



Lemma 7. I{Z; X, Y) - I{Z; X) - I{Z: F) + /|(X V y) > 
Proof. We can reformulate the left hand side 

I{Z; X, Y) - I{Z- X) - I{Z- Y) + II {X \ Y) 
= I{Z- X, Y) - I{Z- Y) - J2p{^)Dkl {p{z\x) II (z)) 

X 

= ^P{x,y)DKL {p{z\x,y) \\p{z\y)) - ^p{x)Dkl {p{z\x) ||p(^^y)(z)) 

= ^^Pix) (J^Piy\^)DKh ipiz\x,y) \\p{z\y))^ - Dkl {piz\x) || p(^Yy)(2;))^ 
and now by the convexity of the Kullback-Leibler divergence: 

>^p{x) {Dkl i^p{y\x)p{z\x,y) ^ p{y\x)p{z\y) \ - Dkl {p{z\x) \\p(^x'^Y)iz)) 

X \ \ y y / y 

= ^P{^) (Dkl {p{z\x) II r{z\x)) - Dkl {p{z\x) \\p(^xx^y){z))) 

X 

where r{z\x) := J2yPiy\^)piz\y) e Cci{{Y)z) and thus 

Dkl ip{z\x) \\ r{z\x)) - Dkl {p{z\x) \\pi^r,\^Y){z)) > for all x e X. 
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^ 1 ' 1 \ 1 

0.2 0.4 0.6 0.8 1 

A 



FIG. 5. Comparison of total mutual information I(Z;X,Y) 

(j j) . our redundancy measure Jrcd (\ jl and 

for varying values of A, where A controls the correllation be- 
tween X and Y. It can be seen Imin measures a constant 
amount of redundancy and therefore does not distinguish be- 
tween redundancy and uniqueness with varying A as desired, 
whereas I^cd does. 



By varying A we can vary the entropy of the outcome Z 
and at the same time exchange unique information for re- 
dundancy. FIG. m illustrates the decomposition at both 
extremal values of A and it can be seen that the resulting 
values of /rod coincide with the proposed values in [l5(. 
The effect of changing A is shown in FIG. [5l 



2. XOR 

The XoR gate (©), is a classical example for the ap- 
pearance of synergy, in the sense of the whole being more 
than the sum of the individuals. We expect to only ob- 
serve synergistic information, as the result is only known 
if both inputs are available, and the uncertainty given 
one input is the same as giving no input at all. Again 
the inputs are uniformly distributed binary random vari- 
ables and Z = X (B Y . In fact, in this case we have 
Ired{Z;X,Y) = In,in{Z]X,Y) = and get the purely 
synergistic decomposition as illustrated in FIG. [S) Note 
that /red defines the redundancy, other terms are all de- 
rived by the decomposition. 



3. AND - Mechanisms at Work 

We now come to the And gate, Z = X AY. This 
turns out to be an interesting case, because it demon- 
strates the subtle difference between redundant informa- 
tion that is due to the "ignorance" of the mechanism with 
respect to the source, and redundancy that is already ap- 
parent in the sources. In 0,13 it is argued that van- 
ishing mutual information between the sources X and 




« Z 



(a) Pl-diagram (b) circuit diagram 

FIG. 6. XOR Example. A purely synergistic mechanism. 




• Z 



(a) Pl-diagram (b) circuit diagram 

FIG. 7. And Example. The total mutual information is 
I{Z;X,Y) = 0.811278. 

Y themselves implies vanishing redundant informatior0. 
This feature is also shared by the synergy measure intro- 
duced in [l^. However, here we would like to embrace 
a different view on redundant information: even if the 
sources are independent, there can be a correlation in 
the change of the distribution over Z given observations 
in X and Y respectively. Observing one input docs not 
give any information about the other input, but part of 
the information gain about the distribution of the out- 
put can be the same as one gets from the other input 
alone. In particular in the case of the And gate, ob- 
serving a in either input leads to p{z = 0) = 1. As a 
result of calculating the redundancy for this example we 
get Ired{Z;X,Y) = I^in{Z;X,Y) = 0.311278, so this is 
another example where minimal and redundant informa- 
tion coincide. FIG. [7| illustrates the decomposition of the 
total mutual information for this example. 

We denote redundant information that is only due to 
the mechanism, as it is the case here, mechanistic redun- 
dancy. Contrary to this we call redundant information 
that already appears in the inputs source redundancy. 
Redundancy in the source must already manifest itself in 
the mutual information between the inputs. We do not 
give a rigorous definition for these terms, as it can be 
seen in the next example, there are cases where it is not 
clear how to separate both. However, if there is positive 
redundant information /rod > but vanishing mutual 



^ "However, because Xi and X2 are independent, [...], thus neces- 
sitating there is zero redundant information f...].". [l5l 
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information between the sources, we will attribute all re- 
dundant information to mechanistic redundancy. 



4- Summing Dice 

Let us now consider an example where we throw two 
dice (cubic dice, with numbered sides from to 5), rep- 
resented by the random variables Di, D2 and sum their 
results. There are several ways to sum the results, we 
could simply add the two results — this would lead to 
results ranging from to 10 where 5 is the most proba- 
ble result and or 10 the least probable results — or we 
multiply the result of the first die by 6 to get a uniform 
distribution of all numbers ranging from to 35. Indeed, 
we will also look at all intermediate summations defined 
by i? = aDi+D2 where a G {1, 2, 3, 4, 5, 6}. Our hypoth- 
esis was that for the direct summation (a ~ 1) there is a 
positive amount of redundancy between Di and D2 with 
respect to R, because knowing the roll of one die gives 
"overlapping" information (in the same direction in the 
space of distributions) with the roll of the other die about 
the final result. The redundancy should then decrease if 
a is increased, up to the point where a = 6 and the sum 
of both dice rolls is isomorphic to the joint variable of 
the two dice rolls, i.e. 6D1 + D2 — {Di,D2). Indeed, this 
is reflected in the redundancy I,-ad{R', Di,D2). In FIG. [5] 
we added an additional parameter A that controls how 
correlated the two dice are, in the same way as A was in- 
troduced in the copy example in Section llV C ll to control 
the correlation between the input variables. For A = 1 
they are independent and it can be seen that the redun- 
dancy increases with decreasing a, on the other extreme 
A = the dice are completely correlated. In this case 
we can see that the redundancy is already existent in the 
source {I{Di,D2) ~ 2.58) shadows all redundancy oth- 
erwise induced through the mechanism and hence there 
is no difference in the redundancy value for all values of 
a. 




FIG. 8. Plot of the redundant information 7red(J?; Di, D2) de- 
pending on the correlation A between the two dice D\ and D2- 
From top to bottom the summation coefficient is a — 1, 6. 
It can be seen that for independent dice A = 1 the amount of 
redundancy depends on the mechanism that is used to sum 
the results, whereas on the other extreme, all redundancy 
comes from the correlation of the sources. 

information per input, and a total 4 bits of mutual infor- 
mation. 

The third example XorAnd, combines an XOR gate 
with an And gate, i.e. Z = {X AY,X ® Y). This ob- 
viously leads to a different result than in [15|, as the 
same effect of mechanistic redundancy appears in the 
And gate, as mentioned in Section flV C 31 

6. Summary 

In summary, these examples show that /rod captures 
proposed the concept of redundancy very well. Further- 
more the resulting decornposition is in agreement with 
the desired examples in [1^ except for the case where 
what we call mechanistic redundancy appears, which was 



5. Composition of Mechanisms 

The last three examples from [l^ are compositions of 
the already shown examples. The first one RdnXor 
combines the redundant copy example (A = 0) with 
an XOR gate: {X, W) and (V, W) are the inputs and 
Z = {W, X (BY) is the output. With our redundancy 
measure, this results in the required composite of one bit 
of redundant and one bit of synergistic information, the 
same as measured with Imin- 

The second example RdnUnqXor, combines an XOR 
gate with the two extremal copy cases. The inputs 
are {Xi,X2,W) and {Yi,Y2,W), all independent and 
uniformly distributed. The output is Z = {Xi 
Yi, (X2, 12), W^)- Here we get the intended 1 bit of in- 
formation in every partial information term, i.e. 1 bit of 
redundant, 1 bit synergistic information and 1 bit unique 



Example 


Expected 


-^rcd 


-^min 


Copy (A = 0) / Rdn 


1 


1 


1 


Copy (A = 1) / Unq 








1 


XOR 











And 


0.311 


0.311 


0.311 


RdnXor 


1 


1 


1 


RdnUnqXor 


1 


1 


2 


XorAnd 


0.5 


0.5 


0.5 


Copy (A < 1) 


I(X;Y) 


I(X;Y) 


1 



TABLE I. Summary of the bivariate redundancy examples. 
Results for the calculations of the examples using /red and 
/min, as well as the expected value that results from consida- 
rations of the desired properties of a redundancy measure, cf. 
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not accounted for in the comparison of current measures 
of synergy. TABLE |T] summarises the comparison of Imin 
and /rod- 



D. Information Transfer 




In 31| the partial information decomposition is used 
to introduce new measures of information transfer. The 
measures are based on a decomposition of transfer en- 
tropy. Transfer entropy, introduced by Schreiber [24j . is 
defined for two random processes Xt and Yt as 



IiXt+i;Yt\Xt). 



(47) 



It measures the influence of the process Y at time t on 
the state of the process X in the next time step. One 
can also take a longer history instead of Yt and Xt into 
account. Conditional mutual information is defined as 




I(Xt+i; Yt, Xt) 

FIG. 9. Pl-diagram for the decomposition of transfer entropy 
into Pl-atoms. The coloured areas denote the transfer en- 
tropy. 



I{Xt+i;Yt\Xt) - I{Xt+i;Yt,Xt) - I{Xt+i;Xt). (48) 

As the conditional entropy is the difference of two mutual 
information terms, the Pi-decomposition can be used to 
decompose transfer entropy into two non-negative com- 
ponents. The decomposition is illustrated in FIG. [9l Let 
R= {Xt,Yt} then it follows from ([Ml) and JSZ]) that 

Ty^x =n'^{Xt+i;{Yt}) +U't,{Xt+i;{Xt,Yt}). (49) 

The first term denotes all information that uniquely 
comes from Yt, called State Independent Transfer En- 
tropy (SITE) by Williams and Beer [3l|. The second 
term on the other hand denotes information that comes 
from Yt but depends on the state of Xt and thus is called 
State Dependent Transfer Entropy (SDTE) in [sif . We 
now apply both measures Jmin (with corresponding PI- 
atoms IIr) and Ted (with corresponding Pl-atoms 11^) 
as the underlying redundancy measure for the decompo- 
sition and compare the results. 

We will consider two examples to show the difference of 
the decomposition when using /red instead of /min- The 
first one revisits an example from [3T| where X and Y 
are two binary, coupled Markov random processes. The 
process Y is uniformly i.i.d. and Xt+i = yt if Xt = 0, 
moreover 

p{xt+i = yt\xt = 1) = 1 ~ d, (50) 
p{xt+i = 1 - yt\xt = 1) = d. (51) 

So d € [0,1] controls whether there is any dependence 
on the previous state of X. If d vanishes X is simply 
a copy of Y. For this example and d = shows only 
state-independent transfer while d = 1 shows only state 
dependent transfer and most importantly the decomposi- 
tions of transfer entropy using either measure (/red, /min) 
coincide (compare with FIG. [TO)) . 

The second example, though constructed for this spe- 
cific purpose, is more intricate. First of all it shows the 
difference between the two measures, but it is also a good 




FIG. 10. Decomposition of transfer entropy Ty^x for the 

first example process. The plot shows SITE (j I using /min, 

1^=1 using /rod) and SDTE (| I using \ I using Ted) 

given d. It can be seen that both decompositions coincide for 
this process. 

example of the subtlety of redundancy in mechanisms. 
Let us consider the following two processes {Xt,Yt) and 
Zt where Zt are uniformly i.i.d. random variables, Xt+i 
is a copy of Xt and 

P{yt+l\yt,Zt) = [l - d)5y,y,^^ +dSz,y,^,. (52) 

The process Yt, copies with probability d the value of 
Zt-i and with probability (1 — d) the value of Yt-i. We 
now measure the transfer entropy /z_>.(x,y)j see FIG. Illl 
for a Bayesian network of the process. 

It can be seen in FIG. [12] that the two decomposi- 
tions coincide for d < 0.5. For d = the two pro- 
cesses are completely independent which is reflected in 
the vanishing overall transfer entropy in this case. On the 
other extreme using d = 1, the decomposition using /red 
gives complete state-independent transfer entropy while 
the decomposition using Imm sees total state-dependent 
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FIG. 11. Bayesian network of the second example process. Xt 
is a parallel and independent process, the only information 
transfer between the processes is from Zt to Vt+i. 



transfer entropy. In this case the decompositions dis- 
agree completely and we argue that our measure reflects 
the process much better. With d = 1 the process always 
copies Zt to Ft+i, which is completely independent of 
{Xt,Yt). Specifically, /mm mistakenly sees redundancy 
between Xt and Zt in the evolution of one timestep. Fol- 
lowing (|39|) and ([37|) this is then reflected in the vanishing 
state- independent transfer entropy for all d (larger redun- 
dancy means more synergy and less unique information, 
given that the mutual information stays constant). 

The fact that /min measures more redundancy has the 
same reason why /min measures redundancy between in- 
dependent X and Y with respect to Z = {X,Y), namely 
it compares changes in different direction in the space of 
distributions. The parallel and independent process Xt 
lets /min see a dependency between the two processes Xt 
and Zt that does not exist. If we consider the transfer 
entropy Tz-^Y from Zt to Yt only, ignoring the process 
Xt completely, we can see in FIG. [13] that the decompo- 
sition f| I jl now coincides with the decomposition 

of Tz-,(x,Y) using /^cd (j I l in FIG. [H]). 

Nonetheless, we have not yet explained the quite un- 
usual non-diffcrcntiable shape of the state-independent 
transfer entropy, which only is positive for d > 0.5. This 
is surprising because up to d = 0.5 all transfer entropy 
is considered to be state-dependent, even though with 
probability d the state of Yt+i takes on the state of Zt- 
As the process Xt was only used to demonstrate that 
using /min for the decomposition measures state depen- 
dencies in the transfer-entropy that are not there, we will 
now leave Xt aside and only consider the process (Yt, Zt) 
as described above. 

To understand the shape of the graph of state- 
dependent transfer entropy of this process, we need to 

have a look at the mutual information I{{Yt^i); Zt) (j 1 

in FIG.[14]) and the redundancy Iicd{Yt+i;Yt, Zt) (\—-\ m 
FIG.[T1]). From it follows that the state- independent 
transfer entropy ([==1 in FIG. [12] and in FIG. [TS]) 
is now the difference of these two terms (compare with 
FIG. El- 

The increase of mutual information /(Yt+i; Zt) is ob- 
vious from the definition of the process. For d = we 
have independence between both processes and for d = 1 
we have Yt+i = Zt- It is also clear that the redundant 
information with respect to Yt+i needs to be zero at the 
extremal points d £ {0,1}, because at these points the 



1 
0.8 
0.6 
0.4 - 
0.2 



0.2 



0.4 0.6 
d 



0.8 



FIG. 12. Decomposition of transfer entropy Tz->(x,y) for the 

second example process. The plot shows SITE (j I using 

/min. I l using /red) and SDTE (| I using /n,in. l 1 using 

/red). 



0.8 

0.6 - 
0.4 - 
0.2 



1- 

0.2 



0.4 



0.8 



d 



FIG. 13. Decomposition of transfer entropy Tz-tY for the 

second example process. The plot shows SITE (| I using 

/mill ), SDTE tnn!] using 



0.8 



0.6 



0.4 - 



0.2 



0.2 



0.4 0.6 
d 



O.g 



FIG. 14. The plot shows I{Yt+i;Zt) ((—j) and 
IrcdiYt+i;Yt, Zt) (j j) for the second example process. 
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value of Yt+i depends either on Yt {d = 0) or Zt {d = 1) 
and therefore either I{Yt+i;Zt) = or I{Yt+i]Yt) — 
which both are upper bounds for the redundancy. 

On the other hand for d = 0.5 the state of either pro- 
cess at time t tells us something about the distribution of 
Yt+i and because the space of distributions of Yt+i is one- 
dimensional, this must be information about a change in 
the same direction, so there is positive redundancy. Ob- 
serving one of the outcomes necessarily contributes to 
some extent to the prediction of the outcome of Yt+i. 
We can now show this more rigourously, we have 

P{yt+l\yt) = pyt+i{l-yt) + " K+iVt: (53) 

piyt+i\zt) = — ^d,y,+i(i-2,) + -^Sy,+izf (54) 

as the conditional distributions given the current state of 
either Yt or Zt- To calculate I,-od{Yt+i; Yt, Zt) we need to 
calculate the projected information ly^ ^{Zt\Yt) and 
^Yt+i O^t \ ^t) ^^'^ redundancy is the minimum of 
both terms. Because the space of distributions A{Yt+i) 
is one dimensional (it is simply the unit interval) we 
can make a simple illustrative argument to compute 

P{zt=o\Yt),P{zt=i\Yt),P{yt=a\Zt) andp(j^^^i^2j), which are 
the terms that are needed to calculate projected infor- 
mation. From the illustration in FIG. [15] it can be seen 
that for d < 0.5, p(^^^^o\^Yt){yt+i) = P(yt=o\Zt}iyt+i) = 
Piyt+i\zt = 0) andp(^^=i^y^)(yt+i) = P(y,=i\Zt)iyt+i) = 
p{yt+i\zt — 1). If we insert this into we get 

that I^^^^ {Zt \ Yt) = I^^^^ {Yt \ Zt) = I{Yt+v,Zt) for 
d< 0.5. 

Conversely for d > 0.5 we get /^^^^ {Zt \ Yt) ^ 
11^^ {Yt \ Zt) = I{Yt+i;Yt) for d < 0.5.'^As I{Yt+i;Zt) 
and I{Yt+i;Yt) are perfectly symmetric, this then ex- 
plains the form of the redundant information as in f| 1 

in FIG.[T4|). Thus, even though Zt and Yt are completely 
independent, the mechanism, which is a random read-out 
(with distribution d,{l — d)), creates redundancy with re- 
spect to Yt-i-i- Furthermore, this explains why we have 
no state-independent transfer entropy for d < 0.5. 



1. Open Loop Controllability 



Ashby 0] proposed and Touchette and Lloyd 27 1 con- 
firmed that there is a natural link between control the- 
ory and information theory. As shown by Touchette and 
Lloyd [1^, for a process, with initial state X and final 
state X', and a controller C which are linked by the prob- 
ability distribution p(x'|a:;, c), the conditional mutual in- 
formation I{X' \ C\X) (which is the transfer entropy from 
the controller to the system) is a measure of controlla- 
bility. Williams and Beer show in [3l| that the decom- 
position of transfer entropy using Irain as a redundancy 
measure has a close relation to the notion of open-loop 
controllability. We will now show, that this is still the 
case if I-^cd is used to decompose transfer entropy. 



Perfect controllability, as defined in [28|, means that 
for all initial states x & X and final states x' ^ X there 
exists a control state c g C such that pfVjx, c) = 1. The 
following equivalence is then shown in [3l| 

Lemma 8. A system is perfectly controllable iff for any 
x' there exists a distribution p{c\x) such that p{x') = 1 
for any distribution p{x). 

It follows also that if a system is perfectly controllable, 
there exists an x' such that p{x'\x) = 1 for each x € X, 
see [m for a proof. Now, a system has perfect open- 
loop controllability iff it hasperfect controllability and 
I{X] C) = 0. Moreover, in [3l| it is shown that the fol- 
lowing theorem holds: 

Theorem 9 (Williams and Beer). A system is perfectly 
open-loop controllable iff it is perfectly controllable with 
vanishing state- dependent transfer entropy (using /minj 
from C to X'. 

We will now also show that this theorem still holds in 
the case where the decomposition using our measure of 
redundant information /red is used. To prove the theo- 
rem we will use the following lemma. It is shown in (sij 
that the condition of the lemma is fulfilled for any per- 
fect open-loop controller and thus proves the direct part 
of the theorem (perfect open-loop controllability implies 
perfect controllability with zero SDTE using I-^cd as a 
redundancy measure): 

Lemma 10. // 

p{x' \x , c) = p{x' \c) x' ^ X ,\fx £ X , c e C 

then the STDE from C to X' is zero. 

Proof From ^ and ^ it follows that 

n'(X'; {C, X}) < I{X'; X, C) - I{X'; X) 

-I{X'-C)+Il,{X\C). (55) 

The synergy is non-negative and now the right hand side 
can be reformulated as in (|43p . But with p{x'\x,c) = 
p{x'\c) \/x, x' £ X,c £ C the positive Kullback-Leibler di- 
vergences in gSl) all vanish. Therefore n'{X'; {C,X}) = 
0. □ 

For the converse direction, perfect controllability and 
vanishing STDE (from C to X') imply perfect open- 
loop controllability, we first need to prove the following 
lemma: 

Lemma 11. If a system is perfectly controllable with a 
distribution p{c\x) then /red (AT'; X, C) = 0. 

Proof. From Lemma [5] it follows that p{x') = 1 for some 
x' E X as well as p{x'\x) = 1 for a\\ x G X and there- 
fore Cci{{X)z) in A(A') is just {p{x')} which implies 
/j^, {C\X) = 0. Thus it follows that /„d(A'; A,C) = 
0. □ 
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Pivt+i = 0) = 1 



p{yt+i\zt = 0) p{yt+i\zt = 1) 

1 ♦ 1 

p(Vt+l) 



p{yt+i\yt=0) I p(yt+i) p{yt+i\yt = i) 

P{'t=o\Yt) = P(yt=<>\Zt) 

P(zt = l\Yt) = P(yt = l\Zt) 

(a) A(yi+i) for d < 0.5 



p(!/t+i = 1) = 1 



P(yt+i = 0) = 1 



p{yt+i\yt = 0) 



piyt+i\yt = 1) 





piyt+i = 1) = 1 



p{yt+i\zt = Q) I p{yt+i) p{yt+i\zt = 1) 

P{zt=0\Yt) = P(Mt=0\Zt) 

P(zt = l\Yt} = P(vt = i\Zt) 

(b) A(yt+i) for d > 0.5 

FIG. 15. Illustration of the conditional distributions of Yt+i for the second example process in the two cases d < 0.5 and 
d > 0.5. The line represents the one dimensional simplex, i.e. the space of probability distributions over Yj+i denoted by 
A{Yt+i) where Yt+i is a binary valued random variable. The black diamond represents the marginal distribution of p{yt+i) 
and the shaded diamonds the conditionals given specific values of Yt and Zt. It can now be seen that the projections are always 
equal to the conditional distributions closer to the marginal of Vt+i. In particular, the projections are the same, no matter in 
which direction the projection is done (from Yt to Zt or vice versa). 



Thus, for the converse direction, starting with perfect 
controllability and vanishing STDE, we have the follow- 
ing equality 

0^n'{X';{C,X}) (56) 
= /(X';X, C) -/(X';X) 
-IiX';C)+I,,d{X';X,C) (57) 
=^ I{X';X,C) - IiX';X) - I{X';C) (58) 
p{x'\x,c)p{x') 



p{x',x,c)\og 



p{x'\c)p{x'\x) 



(59) 



as we also have p{x'\x) 
troUability, 



p{x') because of perfect con- 



E , p{x'\x,c) 
p{x , X, c) log 



p(x'|c) 



(60) 



We also know that for every x G X there exists x' G X 
and c e C such that p{x'\x, c) ~ 1. Thus for any x' £ X 
there exists a c (E C such that p{x'\c) = 1. It is shown in 
[sij that this is equivalent to open-loop controllability. 

Hence, we have shown that Theorem [9] also holds if we 
apply /red as the underlying redundancy measure and the 
relation between open-loop controllability and decompo- 
sition of transfer entropy is transferable to our new mea- 
sure. 



V. DISCUSSION 

The motivation for this paper was to overcome the 
shortcomings of current measures of redundancy and syn- 
ergy. We introduced a new measure for bivariate redun- 
dant information. Redundant information between two 
random variables is information that is shared between 



two variables. In contrast to mutual information, redun- 
dant information denotes information with respect to the 
outcome of a third variable. Our measure is conceptually 
motivated by measuring similarities in the direction of 
change in the outcome distribution, depending on which 
input is observed. We proved that the construction ad- 
heres to properties of redundancy as stated in the litera- 
ture, and can be used for a non-negative decomposition 
of mutual information. The measure is closely related 
to the concept of minimal information as introduced in 

We demonstrated in several examples that /rod follows 
several intuitions about redundancy. Furthermore, it is 
possible to decompose transfer entropy as considered in 
[3l|; in particular we showed that using minimal infor- 
mation instead of redundant information to decompose 
transfer entropy can lead to the detection of fake state- 
dependent transfer entropy. We were able to prove that 
the results about open-loop controllability from [31[ are 
also applicable to the decomposition using /red- Thus our 
measure is able to serve as a replacement for the bivariate 
version of minimal information. 

A particular insight of our definition is the emphasis 
of mechanisms in the concept of redundant information, 
which has been rather neglected in the literature so far. 
Firstly, we linked bivariate redundant information in the 
case of a copying mechanism to the mutual information 
between the input variables. We identify redundant in- 
formation that already appears in the inputs with source 
redundancy, contrary to redundant information that is 
only due to the mechanism, as demonstrated in the And- 
gate or the 50:50-readout. We identify this kind of redun- 
dancy with mechanistic redundancy. This is in contrast 
to the redundancy measure proposed in [l^ which does 
not capture mechanistic redundancy. The separation of 
both kinds of redundancy is not explicit at this point, 
and currently we do not yet propose a clear and obvi- 
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ous separation of mechanistic and source contributions 
of redundant information. 

Future work will show whether it is possible to separate 
the two concepts of mechanistic and source redundancy 
when they appear simultaneously. Another limitation 
we currently have is the restriction to a bivariate mea- 
sure. In general, however, there are applications where it 
is interesting to be able to compute redundant informa- 
tion between more than two variables [ll,!!^]- However, 
the geometric structure for this problem gets significantly 
more complex, and it is, for example, not entirely clear 
by what the identity property should be replaced in the 
multivariate case. There are several ways to generalize 
mutual information to a multivariate measure, none of 
which seems to be fitting in this case. The construction 



of a multivariate measure of redundant information, as 
well as a generalization to continuous random variables 
is part of ongoing research. 
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